Eeectiveness of Message Strip-mining for Regular and Irregular Communication
نویسندگان
چکیده
Languages such as High Performance Fortran are used to implement parallel algorithms by distributing large data structures across a multicomputer system. To hide communication behind computation, we introduce an optimization scheme, message strip-mining. By using this scheme, the communication overhead is almost completely overlapped with the subsequent computation. We have implemented the proposed scheme for redistribution of arrays (regular communication) and executor for indirect access (irregular communication), and have achieved speedups of 3.5 and 2.6 for a redistribution of 2560 2560 array and an executor to collect data whose size is 5 10 5 for each processor, respectively.
منابع مشابه
An E cient Uniform Run - time Scheme for Mixed Regular - IrregularApplications
Almost all applications containing indirect array addressing (irregular accesses) have a substantial number of direct array accesses (regular accesses) too. A conspicuous percentage of these direct array accesses usually require inter-processor communication for the applications to run on a distributed memory multicomputer. This study highlights how lack of a uniform representation and lack of ...
متن کاملMessage Strip-Mining Heuristics for High Speed Networks
In this work we investigate how the compiler technique of message strip mining performs in practice on contemporary high performance networks. Message strip mining attempts to reduce the overall cost of communication in parallel programs by breaking up large message transfers into smaller ones that can be overlapped with computation. In practice, however, network resource constraints may negate...
متن کاملA Data Reorganization Technique for Improving Data Locality of Irregular Applications in Software Distributed Shared Memory
Irregular applications are characterized by highly irregular and ne-grained data referencing patterns. When there is poor locality between the ne-grained data, serious false sharing can occur which has largely contributed to poor performance of irregular applications on page-based software distributed shared memory (DSM) systems. Partitioning data in irregular applications to improve data local...
متن کاملOptimizing Partitioned Global Address Space Programs for Cluster Architectures
Optimizing Partitioned Global Address Space Programs for Cluster Architectures by Wei-Yu Chen Doctor of Philosophy in Computer Science University of California, Berkeley Professor Katherine A. Yelick, Chair Unified Parallel C (UPC) is an example of a partitioned global address space language for high performance parallel computing. This programming model enables application to be written in a s...
متن کاملParallelizing Irregular Applications through the YAPPA Compilation Framework
Modern High Performance Computing (HPC) clusters are composed of hundred of nodes integrating multicore processors with advanced cache hierarchies. These systems can reach several petaflops of peak performance, but are optimized for floating point intensive applications, and regular, localizable data structures. The network interconnection of these systems is optimized for bulk, synchronous tra...
متن کامل